Online Japanese Unknown Morpheme Detection using Orthographic Variation

نویسندگان

  • Yugo Murawaki
  • Sadao Kurohashi
چکیده

To solve the unknown morpheme problem in Japanese morphological analysis, we previously proposed a novel framework of online unknown morpheme acquisition and its implementation. This framework poses a previously unexplored problem, online unknown morpheme detection. Online unknown morpheme detection is a task of finding morphemes in each sentence that are not listed in a given lexicon. Unlike in English, it is a non-trivial task because Japanese does not delimit words by white space. We first present a baseline method that simply uses the output of the morphological analyzer. We then show that it fails to detect some unknown morphemes because they are over-segmented into shorter registered morphemes. To cope with this problem, we present a simple solution, the use of orthographic variation of Japanese. Under the assumption that orthographic variants behave similarly, each over-segmentation candidate is checked against its counterparts. Experiments show that the proposed method improves the recall of detection and contributes to improving unknown morpheme acquisition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Acquisition of Japanese Unknown Morphemes using Morphological Constraints

We propose a novel lexicon acquirer that works in concert with the morphological analyzer and has the ability to run in online mode. Every time a sentence is analyzed, it detects unknown morphemes, enumerates candidates and selects the best candidates by comparing multiple examples kept in the storage. When a morpheme is unambiguously selected, the lexicon acquirer updates the dictionary of the...

متن کامل

Recognition and Verification of Engl for Computer-assisted Languag

We address methods for recognizing English spoken by Japanese students as the basis for our Computer-Assisted Language Learning (CALL) system. For automatic phonemic error detection, pronunciation error prediction is executed for a given orthographic text. To improve reliability, speaker adaptation and segment-input pair-wise verification are applied as pre-processing and post-processing, respe...

متن کامل

Automatic Bunsetsu Segmentation of Japanese Sentences Using a Classification Tree

Bunsetsu, which is comprised of a content word followed by, possibly 0, function words, is a convenient unit for dependency structure analysis of Japanese. There are, however, no spaces indicating bunsetsu boundaries in the orthographic writing of Japanese. Thus a sentence must be segmented into bunsetsu's by some means prior to dependency structure analysis. Conventionally, such segmentation h...

متن کامل

Automatic Bunsetsu Segmentation of Japanese Sentences Using a Classi cation Tree

Bunsetsu, which is comprised of a content word followed by, possibly 0, function words, is a convenient unit for dependency structure analysis of Japanese. There are, however, no spaces indicating bunsetsu boundaries in the orthographic writing of Japanese. Thus a sentence must be segmented into bunsetsu's by some means prior to dependency structure analysis. Conventionally, such segmentation h...

متن کامل

Orthographic Reading Deficits in Dyslexic Japanese Children: Examining the Transposed-Letter Effect in the Color-Word Stroop Paradigm

In orthographic reading, the transposed-letter effect (TLE) is the perception of a transposed-letter position word such as "cholocate" as the correct word "chocolate." Although previous studies on dyslexic children using alphabetic languages have reported such orthographic reading deficits, the extent of orthographic reading impairment in dyslexic Japanese children has remained unknown. This st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010